AITopics | sensitivity and specificity

Collaborating Authors

sensitivity and specificity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Performance of the SafeTerm AI-Based MedDRA Query System Against Standardised MedDRA Queries

Vandenhende, Francois, Georgiou, Anna, Georgiou, Michalis, Psaras, Theodoros, Karekla, Ellie, Hadjicosta, Elena

arXiv.org Artificial IntelligenceDec-9-2025

In pre-market drug safety review, grouping related adverse event terms into SMQs or OCMQs is critical for signal detection. We assess the performance of SafeTerm Automated Medical Query (AMQ) on MedDRA SMQs. The AMQ is a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score (0-1) using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity, and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against tier-1 SMQs (110 queries, v28.1). Precision, recall and F1 were computed at multiple similarity-thresholds, defined either manually or using an automated method. High recall (94%)) is achieved at moderate similarity thresholds, indicative of good retrieval sensitivity. Higher thresholds filter out more terms, resulting in improved precision (up to 89%). The optimal threshold (0.70)) yielded an overall recall of (48%) and precision of (45%) across all 110 queries. Restricting to narrow-term PTs achieved slightly better performance at an increased (+0.05) similarity threshold, confirming increased relatedness of narrow versus broad terms. The automatic threshold (0.66) selection prioritizes recall (0.58) to precision (0.29). SafeTerm AMQ achieves comparable, satisfactory performance on SMQs and sanitized OCMQs. It is therefore a viable supplementary method for automated MedDRA query generation, balancing recall and precision. We recommend using suitable MedDRA PT terminology in query formulation and applying the automated threshold method to optimise recall. Increasing similarity scores allows refined, narrow terms selection.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.07552

Country:

North America > United States (0.15)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)

Add feedback

Cohort-attention Evaluation Metric against Tied Data: Studying Performance of Classification Models in Cancer Detection

Wei, Longfei, Sheng, Fang, Zhang, Jianfei

arXiv.org Machine LearningMar-16-2025

Artificial intelligence (AI) has significantly improved medical screening accuracy, particularly in cancer detection and risk assessment. However, traditional classification metrics often fail to account for imbalanced data, varying performance across cohorts, and patient-level inconsistencies, leading to biased evaluations. We propose the Cohort-Attention Evaluation Metrics (CAT) framework to address these challenges. CAT introduces patient-level assessment, entropy-based distribution weighting, and cohort-weighted sensitivity and specificity. Key metrics like CATSensitivity (CATSen), CATSpecificity (CATSpe), and CATMean ensure balanced and fair evaluation across diverse populations. This approach enhances predictive reliability, fairness, and interpretability, providing a robust evaluation method for AI-driven medical screening models.

artificial intelligence, machine learning, specificity, (17 more...)

arXiv.org Machine Learning

2503.12755

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Sample size determination for machine learning in medical research

Arifin, Wan Nor, Yaacob, Najib Majdi

arXiv.org Artificial IntelligenceMar-4-2025

Machine learning (ML) methods are being increasingly used across various domains of medicine research. However, despite advancements in the use of ML in medicine, clear and definitive guidelines for determining sample sizes in medical ML research are lacking. This article proposes a method for determining sample sizes for medical research utilizing ML methods, beginning with the determination of the testing set sample size, followed with the determination of the training set and total sample sizes. Introduction Machine learning (ML) methods are being increasingly used in medical research, spanning various domains of medicine from oncology, orthopaedics, ophthalmology and general practice (Sirocchi et al., 2024). However, despite this advancement in medical research, currently there are no clear and definitive guidelines for determining sample sizes when using ML methods in the medical domain.

determination, medical research, sample size, (13 more...)

arXiv.org Artificial Intelligence

2503.05809

Country:

Asia > Malaysia (0.08)
North America > United States > New York (0.05)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AGE2HIE: Transfer Learning from Brain Age to Predicting Neurocognitive Outcome for Infant Brain Injury

Bao, Rina, He, Sheng, Grant, Ellen, Ou, Yangming

arXiv.org Artificial IntelligenceNov-7-2024

Hypoxic-Ischemic Encephalopathy (HIE) affects 1 to 5 out of every 1,000 newborns, with 30% to 50% of cases resulting in adverse neurocognitive outcomes. However, these outcomes can only be reliably assessed as early as age 2. Therefore, early and accurate prediction of HIE-related neurocognitive outcomes using deep learning models is critical for improving clinical decision-making, guiding treatment decisions and assessing novel therapies. However, a major challenge in developing deep learning models for this purpose is the scarcity of large, annotated HIE datasets. We have assembled the first and largest public dataset, however it contains only 156 cases with 2-year neurocognitive outcome labels. In contrast, we have collected 8,859 normal brain black Magnetic Resonance Imagings (MRIs) with 0-97 years of age that are available for brain age estimation using deep learning models. In this paper, we introduce AGE2HIE to transfer knowledge learned by deep learning models from healthy controls brain MRIs to a diseased cohort, from structural to diffusion MRIs, from regression of continuous age estimation to prediction of the binary neurocognitive outcomes, and from lifespan age (0-97 years) to infant (0-2 weeks). Compared to training from scratch, transfer learning from brain age estimation significantly improves not only the prediction accuracy (3% or 2% improvement in same or multi-site), but also the model generalization across different sites (5% improvement in cross-site validation).

artificial intelligence, deep learning model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2411.05188

Country: North America > United States > Massachusetts (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.94)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Diagnostic Performance of Deep Learning for Predicting Gliomas' IDH and 1p/19q Status in MRI: A Systematic Review and Meta-Analysis

Farahani, Somayeh, Hejazi, Marjaneh, Tabassum, Mehnaz, Di Ieva, Antonio, Mahdavifar, Neda, Liu, Sidong

arXiv.org Artificial IntelligenceOct-28-2024

Gliomas, the most common primary brain tumors, show high heterogeneity in histological and molecular characteristics. Accurate molecular profiling, like isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, is critical for diagnosis, treatment, and prognosis. This review evaluates MRI-based deep learning (DL) models' efficacy in predicting these biomarkers. Following PRISMA guidelines, we systematically searched major databases (PubMed, Scopus, Ovid, and Web of Science) up to February 2024, screening studies that utilized DL to predict IDH and 1p/19q codeletion status from MRI data of glioma patients. We assessed the quality and risk of bias using the radiomics quality score and QUADAS-2 tool. Our meta-analysis used a bivariate model to compute pooled sensitivity, specificity, and meta-regression to assess inter-study heterogeneity. Of the 565 articles, 57 were selected for qualitative synthesis, and 52 underwent meta-analysis. The pooled estimates showed high diagnostic performance, with validation sensitivity, specificity, and area under the curve (AUC) of 0.84 [prediction interval (PI): 0.67-0.93, I2=51.10%, p < 0.05], 0.87 [PI: 0.49-0.98, I2=82.30%, p < 0.05], and 0.89 for IDH prediction, and 0.76 [PI: 0.28-0.96, I2=77.60%, p < 0.05], 0.85 [PI: 0.49-0.97, I2=80.30%, p < 0.05], and 0.90 for 1p/19q prediction, respectively. Meta-regression analyses revealed significant heterogeneity influenced by glioma grade, data source, inclusion of non-radiomics data, MRI sequences, segmentation and feature extraction methods, and validation techniques. DL models demonstrate strong potential in predicting molecular biomarkers from MRI scans, with significant variability influenced by technical and clinical factors. Thorough external validation is necessary to increase clinical utility.

artificial intelligence, in-house, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.02426

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prognosis of COVID-19 using Artificial Intelligence: A Systematic Review and Meta-analysis

Motamedian, SaeedReza, Mohaghegh, Sadra, Oregani, Elham Babadi, Amjadi, Mahrsa, Shobeiri, Parnian, Cheraghi, Negin, Solouki, Niusha, Ahmadi, Nikoo, Mohammad-Rahimi, Hossein, Bouchareb, Yassine, Rahmim, Arman

arXiv.org Artificial IntelligenceJul-31-2024

Purpose: Artificial intelligence (AI) techniques have been extensively utilized for diagnosing and prognosis of several diseases in recent years. This study identifies, appraises and synthesizes published studies on the use of AI for the prognosis of COVID-19. Method: Electronic search was performed using Medline, Google Scholar, Scopus, Embase, Cochrane and ProQuest. Studies that examined machine learning or deep learning methods to determine the prognosis of COVID-19 using CT or chest X-ray images were included. Polled sensitivity, specificity area under the curve and diagnostic odds ratio were calculated. Result: A total of 36 articles were included; various prognosis-related issues, including disease severity, mechanical ventilation or admission to the intensive care unit and mortality, were investigated. Several AI models and architectures were employed, such as the Siamense model, support vector machine, Random Forest , eXtreme Gradient Boosting, and convolutional neural networks. The models achieved 71%, 88% and 67% sensitivity for mortality, severity assessment and need for ventilation, respectively. The specificity of 69%, 89% and 89% were reported for the aforementioned variables. Conclusion: Based on the included articles, machine learning and deep learning methods used for the prognosis of COVID-19 patients using radiomic features from CT or CXR images can help clinicians manage patients and allocate resources more effectively. These studies also demonstrate that combining patient demographic, clinical data, laboratory tests and radiomic features improves model performances.

mortality, prognosis, sensitivity 0, (15 more...)

arXiv.org Artificial Intelligence

2408.00208

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Crowdsourcing with Difficulty: A Bayesian Rating Model for Heterogeneous Items

Han, Seong Woo, Adıgüzel, Ozan, Carpenter, Bob

arXiv.org Machine LearningMay-29-2024

In applied statistics and machine learning, the "gold standards" used for training are often biased and almost always noisy. Dawid and Skene's justifiably popular crowdsourcing model adjusts for rater (coder, annotator) sensitivity and specificity, but fails to capture distributional properties of rating data gathered for training, which in turn biases training. In this study, we introduce a general purpose measurement-error model with which we can infer consensus categories by adding item-level effects for difficulty, discriminativeness, and guessability. We further show how to constrain the bimodal posterior of these models to avoid (or if necessary, allow) adversarial raters. We validate our model's goodness of fit with posterior predictive checks, the Bayesian analogue of $\chi^2$ tests. Dawid and Skene's model is rejected by goodness of fit tests, whereas our new model, which adjusts for item heterogeneity, is not rejected. We illustrate our new model with two well-studied data sets, binary rating data for caries in dental X-rays and implication in natural language.

logit 1, probability, rater, (15 more...)

arXiv.org Machine Learning

2405.19521

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.49)

Industry:

Health & Medicine (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.87)
(3 more...)

Add feedback

Real-Time Inference for a Gamma Process Model of Neural Spiking David Carlson, 2 Lawrence Carin

Neural Information Processing SystemsMar-13-2024, 18:38:14 GMT

With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis--using data with partial ground truth as well as two novel data sets--we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain.

neuron, spike, waveform, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Workflow (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

A Priori Determination of the Pretest Probability

Balayla, Jacques

arXiv.org Artificial IntelligenceJan-8-2024

In this manuscript, we present various proposed methods estimate the prevalence of disease, a critical prerequisite for the adequate interpretation of screening tests. To address the limitations of these approaches, which revolve primarily around their a posteriori nature, we introduce a novel method to estimate the pretest probability of disease, a priori, utilizing the Logit function from the logistic regression model. This approach is a modification of McGee's heuristic, originally designed for estimating the posttest probability of disease. In a patient presenting with $n_\theta$ signs or symptoms, the minimal bound of the pretest probability, $\phi$, can be approximated by: $\phi \approx \frac{1}{5}{ln\left[\displaystyle\prod_{\theta=1}^{i}\kappa_\theta\right]}$ where $ln$ is the natural logarithm, and $\kappa_\theta$ is the likelihood ratio associated with the sign or symptom in question.

pretest probability, prevalence, probability, (16 more...)

arXiv.org Artificial Intelligence

2401.04086

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Missouri (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Epidemiology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Add feedback

ChatGPT and post-test probability

Weisenthal, Samuel J.

arXiv.org Artificial IntelligenceDec-23-2023

Reinforcement learning-based large language models, such as ChatGPT, are believed to have potential to aid human experts in many domains, including healthcare. There is, however, little work on ChatGPT's ability to perform a key task in healthcare: formal, probabilistic medical diagnostic reasoning. This type of reasoning is used, for example, to update a pre-test probability to a post-test probability. In this work, we probe ChatGPT's ability to perform this task. In particular, we ask ChatGPT to give examples of how to use Bayes rule for medical diagnosis. Our prompts range from queries that use terminology from pure probability (e.g., requests for a posterior of A given B and C) to queries that use terminology from medical diagnosis (e.g., requests for a posterior probability of Covid given a test result and cough). We show how the introduction of medical variable names leads to an increase in the number of errors that ChatGPT makes. Given our results, we also show how one can use prompt engineering to facilitate ChatGPT's partial avoidance of these errors. We discuss our results in light of recent commentaries on sensitivity and specificity. We also discuss how our results might inform new research directions for large language models.

cough, covid, probability, (12 more...)

arXiv.org Artificial Intelligence

2311.12188

Country:

North America > United States > Arizona > Pima County (0.04)
Asia > Pakistan (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.76)

Add feedback